Outlier Detection for Support Vector Machine using Minimum Covariance Determinant Estimator

نویسندگان

  • M. Mohammadi Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran.
  • M. Sarmad Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad, Iran.
چکیده مقاله:

The purpose of this paper is to identify the effective points on the performance of one of the important algorithm of data mining namely support vector machine. The final classification decision has been made based on the small portion of data called support vectors. So, existence of the atypical observations in the aforementioned points, will result in deviation from the correct decision. Thus, the idea of Debruyne’s “outlier map” is employed in this paper to identify the outlying points in the SVM classification problem. However, due to the computational reasons such as convenience and rapidity, a robust Mahalanobis distance based on the minimum covariance determinant estimator is utilized. This method has a good compatibility by the data with low dimensional structure. In addition to the classification accuracy, the margin width is used as the criterion for the performance assessment. The larger margin is more desired, due to the higher generalization ability. It should be noted that, by omission of the detected outliers using the suggested outlier map the generalization ability and accuracy of SVM are increased. This leads to the conclusion that the proposed method is very efficient in identifying the outliers. The capability of recognizing the outlying and misclassified observations for this new version of outlier map has been retained similar to the older version, which is tested on the simulated and real world data.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator

Mahalanobis-type distances in which the shape matrix is derived from a consistent highbreakdown robust multivariate location and scale estimator can be used to 2nd outlying points. Hardin and Rocke (http://www.cipic.ucdavis.edu/∼dmrocke/preprints.html) developed a new method for identifying outliers in a one-cluster setting using an F distribution. We extend the method to the multiple cluster c...

متن کامل

The minimum weighted covariance determinant estimator

In this paper we introduce weighted estimators of the location and dispersion of a multivariate data set with weights based on the ranks of the Mahalanobis distances. We discuss some properties of the estimators like the breakdown point, influence function and asymptotic variance. The outlier detection capacities of different weight functions are compared. A simulation study is given to investi...

متن کامل

A Fast Algorithm for the Minimum Covariance Determinant Estimator

The minimum covariance determinant (MCD) method of Rousseeuw (1984) is a highly robust estimator of multivariate location and scatter. Its objective is to nd h observations (out of n) whose covariance matrix has the lowest determinant. Until now applications of the MCD were hampered by the computation time of existing algorithms, which were limited to a few hundred objects in a few dimensions. ...

متن کامل

RelaxMCD: Smooth optimisation for the Minimum Covariance Determinant estimator

The Minimum Covariance Determinant (MCD) estimator is a highly robust procedure for estimating the centre and shape of a high dimensional data set. It consists of determining a subsample of h points out of nwhichminimises the generalised variance. By definition, the computation of this estimator gives rise to a combinatorial optimisation problem, forwhich several approximate algorithms have bee...

متن کامل

Using Wavelet Support Vector Machine for Fault Diagnosis of Gearboxes

Identifying fault categories, especially for compound faults, is a challenging task in mechanical fault diagnosis. For this task, this paper proposes a novel intelligent method based on wavelet packet transform (WPT) and multiple classifier fusion. An unexpected damage on the gearbox may break the whole transmission line down. It is therefore crucial for engineers and researchers to monitor the...

متن کامل

Support Vector Clustering for Outlier Detection

In this paper a novel Support vector clustering(SVC) method for outlier detection is proposed. Outlier detection algorithms have application in several tasks such as data mining, data preprocessing, data filter-cleaner, time series analysis and so on. Traditionally outlier detection methods are mostly based on modeling data based on its statistical properties and these approaches are only prefe...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 7  شماره 2

صفحات  299- 309

تاریخ انتشار 2019-04-01

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023